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Abstract: Given a general source V = {F n }^ ( L 1 with countably infi- 
nite source alphabet and a general channel W = {PF™}^! with arbitrary 
abstract channel input/channel output alphabets, we study the joint source- 
channel coding problem from the information-spectrum point of view. First, 
we generalize Feinstein's lemma (direct part) and Verdii-Han's lemma (con- 
verse part) so as to be applicable to the general joint source-channel coding 
problem. Based on these lemmas, we establish a sufficient condition as well 
as a necessary condition for the source V to be reliably transmissible over 
the channel W with asymptotically vanishing probability of error. It is 
shown that our sufficient condition is equivalent to the sufficient condition 
derived by Vembu, Verdii and Steinberg [9], whereas our necessary condi- 
tion is shown to be stronger than or equivalent to the necessary condition 
derived by them. It turns out, as a direct consequence, that "separation 
principle" in a relevantly generalized sense holds for a wide class of sources 
and channels, as was shown in a quite dfifferent manner by Vembu, Verdu 
and Steinberg [9]. It should also be remarked that a nice duality is found be- 
tween our necessary and sufficient conditions, whereas we cannot fully enjoy 
such a duality between the necessary condition and the sufficient condition 
by Vembu, Verdu and Steinberg [9]. In addition, we demonstrate a suf- 
ficient condition as well as a necessary condition for the e-transmissibility 
(0 < e < 1). Finally, the separation theorem of the traditional standard 
form is shown to hold for the class of sources and channels that satisfy the 
semi-strong converse property. 

Index terms: general source, general channel, joint source-channel 
coding, separation theorem, information-spectrum, transmissibility, gener- 
alized Feinstein's lemma, generalized Verdii-Han's lemma 







1 Introduction 



Given a source V = {V n }^ = i and a channel W = {l^ n }^L 1 , joint source- 
channel coding means that the encoder maps the output from the source 
directly to the channel input (one step encoding), where the probability of 
decoding error is required to vanish as block-length n tends to oo. In usual 
situations, however, the joint source-channel coding can be decomposed into 
separate source coding and channel coding (two step encoding). This two step 
encoding does not cause any disadvantages from the standpoint of asymp- 
totically vanishing error probabilities, provided that the so-called Separation 
Theorem holds. 

Typically, the traditional separation theorem, which we call the sepa- 
ration theorem in the narrow sense, states that if the infimum Rf(V) of 
all achievable fixed-length coding rates for the source V is smaller than the 
capacity C(W) for the channel W, then the source V is reliably transmis- 
sible by two step encoding over the channel W; whereas if Rf(V) is larger 
than C(W) then the reliable transmission is impossible. While the former 
statement is always true for any general source V and any general channel 
W, the latter statement is not always true. Then, a very natural question 
may be raised for what class of sources and channels and in what sense the 
separation theorem holds in general. 

Shannon [1] has first shown that the separation theorem holds for the 
class of stationary memoryless sources and channels. Since then, this theo- 
rem has received extensive attention by a number of researchers who have 
attempted to prove versions that apply to more and more general classes of 
sources and channels. Among others, for example, Dobrushin [4], Pinsker [5], 
and Hu [6] have studied the separation theorem problem in the framework 
of information-stable sources and channels. 

Recently, on the other hand, Vembu, Verdu and Steinberg [9] have put 
forth this problem in a much more general information-spectrum context 
with general source V and general channel W. From the viewpoint of 
information spectra, they have generalized the notion of separation theorem 
and shown that, usually in many cases even with R/(V) > C(W), it is 
possible to reliably transmit the output of the source V over the channel 
W. Furthermore, in terms of information spectra, they have established a 
sufficient condition for the transmissibility as well as a necessary condition. 
It should be noticed here that, in this general joint source-channel coding 
situation, what indeed matters is not the validity problem of the traditional 
type of separation theorems but the derivation problem of necessary and/or 
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sufficient conditions for the transmissibility from the information-spectrum 
point of view. 

However, while their sufficient condition looks simple and significantly 
tight, their necessary condition does not look quite close to tight. 

The present paper was mainly motivated by the reasonable question 
why the forms of these two conditions look rather very different from one 
another. First, in Section 3, the basic tools to answer this question are 
established, i.e., two fundamental lemmas: a generalization of Feinstein's 
lemma [2] and a generalization of Verdu-Han's lemma [8], which provide 
with the very basis for the key results to be stated in the subsequent sec- 
tions. These lemmas are of dualistic information-spectrum forms, which is in 
nice accordance with the general joint source-channel coding framework. In 
Section 4, given a general source V and a general channel W, we establish, in 
terms of information-spectra, a sufficient condition (Direct theorem) for the 
transmissibility as well as a necessary condition (Converse theorem). The 
forms of these two conditions are very close from each other, and "fairly" 
coincides with one another, provided that we dare disregard some relevant 
asymptotically vanishing term. 

Next, we equivalently rewrite these conditions in the forms useful to see 
relations to the separation theorem. As a consequence, it turns out that 
a separation-theorem- like equivalent of our sufficient condition just coin- 
cides with the sufficient condition given by Vembu, Verdu and Steinberg [9], 
whereas a separation-theorem-like equivalent of our necessary condition is 
shown to be strictly stronger than or equivalent to the necessary condition 
given by them. Here it is pleasing to observe that a nice duality is found be- 
tween our necessary and sufficient conditions, whereas we cannot fully enjoy 
such a duality between the necessary condition and the sufficient condition 
by Vembu, Verdu and Steinberg [9]. 

On the other hand, in Section 5, we demonstrate a sufficient condition 
as well as a necessary condition for the e-transmissibility, which is the gen- 
eralization of the sufficient condition as well as the necessary condition as 
was shown in Section 4. Finally, in Section 6, we restrict the class of sources 
and channels to those that satisfy the strong converse property (or, more 
generally, the semi-strong converse property) to show that the separation 
theorem in the traditional sense holds for this class. 
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2 Basic Notation and Definitions 



In this preliminary section, we prepare the basic notation and definitions 
which will be used in the subsequent sections. 

2.1 General Sources 

Let us first give here the formal defintion of the general source. A general 
sources is defined as an infinite sequence V = {V n = (v} n \ • • • , of 
n-dimensional random variables V n where each component random variable 
y( n ) ^ < ■ < n ) takes values in a countably infinite set V that we call 
the source alphabet. It should be noted here that each component of V n 
may change depending on block length n. This implies that the sequence 
V is quite general in the sense that it may not satisfy even the consistency 
condition as usual processes, where the consistency condition means that 
for any integers m, n such that m < n it holds that = for all 

i = 1,2, ■ ■ ■ ,m. The class of sources thus defined covers a very wide range of 
sources including all nonstationary and/or nonergodic sources (cf. Han and 
Verdii [7]). 

2.2 General Channels 

The formal definition of a general channel is as follows. Let X, y be arbitrary 
abstract (not necessarily countable) sets, which we call the input alphabet 
and the output alphabet, respectively. A general channel is defined as an 
infinite sequence W = {W n : X n — > 3^ n }^Li of n-dimensional probability 
transition matrices W n , where W n (y|x) (x £ X n ,y £ y n ) denotes the 
conditonal probability of y given x.* The class of channels thus defined 
covers a very wide range of channels including all nonstationary and/or 
nonergodic channels with arbitrary memory structures (cf. Han and Verdii 
[7])- 

Remark 2.1 A more reasonable definition of a general source is the fol- 
lowing. Let {Vnj^Lx be any sequence of arbitrary source alphabets V n (a 
countabley infinite or abstract set) and let V n be any random variable taking 
values in V n (n = 1,2, ■••). Then, the sequence V = {V n } c ^ =l of random 

*In the case where the output alphabet y is abstract, ^"(yjx) is understood to be the 
(conditional) probability measure element ^"(dylx) that is measurable in x. 



3 



variables V n is called a general source (cf. Verdu and Han [10]). The above 
definition is a special case of this general source with V n = V n (n = 1, 2, • • •). 

On the other hand, a more reasonable definition of the general channel 
is the following. Let {W n : X n — > y n }%L\ be any sequence of arbitrary 
probability transition matrices, where X n ,y n are arbitrary abstract sets. 
Then, the sequence W = {W^}^^ of probability transition matrices W n 
is called a general channel (cf. Han [11]). The above definition is a special 
case of this general channel with X n = X n ,y n = y n (n = 1,2, ■ ■ •). 

The results in this paper (Lemma 3.1, Lemma 3.2, Theorem 4.1, Theorem 
4.2, Theorem 4.3, Theorem 4.4, Theorem 5.1, Theorem 5.2 and Theorems 
6.1 ~ 6.7 ) continue to be valid as well also in this more general setting with 
V n ,V n ,V and X n ,y n , W n , W replaced by V n , V n ,V and X n , y n , W n , W, 
respectively. 

In the sequel we use the convention that Pz(-) denotes the probability 
distribution of a random variable Z, whereas Pz\u('\') denotes the condi- 
tional probability distribution of a random variable Z given a random vari- 
able U. □ 



2.3 Joint Source-Channel Coding 

Let V = {V n = (V} n \ • • • , K (n) )}£°=i be any general source, and let W = 
{W n (-|-) : X n — > y n }^ = i be any general channel. We consider an encoder 
(p n : V n -► X n and a decoder ip n : y n V n , and put X n = 9? n (y n ). Then, 
denoting by Y n the output from the channel W n due to the input X n , we 
have the obvious relation: 

yn _^ X n _^ yn ^ Markov c h a in). (2.1) 

The error •probability e n with code (ip n ,ip n ) is defined by 

e n = Pr{V n ^MY n )} 

= ]T Pv-(v)W"(P c (v)|^(v)), (2.2) 
veV" 

where £>(v) = {y G 3 ;ra |V'n(y) = v} (Vv G V") (f (v) is called the decoding 
set for v) and "c" denotes the complement of a set. A pair ((p n ,ip n ) with 
error probability e n is simply called a joint source-channel code (n,e n ). 

We now define the transmissibility in terms of joint source-channel codes 
(n,e n ) as 
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Definition 2.1 



Source V is transmissible over channel W •<=>■ There exists an (n, e n ) code 

such that lim e n = 0. 

With this definition of transmissibility, in the following sections we shall 
establish a sufficient condition as well as a necessary condition for the trans- 
missibility when we are given a geneal source V and a general channel W. 
These two conditions are very close to each other and could actually be seen 
as giving "almost the same condition, 1 '' provided that we dare disregard an 
asymptotically negligible term 7„ — > appearing in those conditions (cf. 
Section 4). 

Remark 2.2 The quantity e n defined by (2.2) is more specifically called 
the average error probability, because it is averaged with respect to Py n (v) 
over all source outputs v £ V n . On the other hand, we may define another 
kind of error probability by 

e n = sup W n {V c (v)\ Vn {v)), (2.3) 

V.Pyn (V)>0 

which we call the maximum error probability. It is evident that the trans- 
missibility in the maximum sense implies the transmissibility in the average 
sense. However, the inverse is not necessarily true. To see this, it suffices 
to consider the following simple example. Let the source, channel input, 
channel output alphabets be V n = {0,1,2}, X n = {1,2}, y n = {1,2}, re- 
spectively; and the (deterministic) channel W n : X n — ► y n be defined by 
WjiO'I*) = 1 for i = j, W n (l\0) = 1. Moreover, let the source V n have 
probability distribution P Vn (0) = a n , P Vn {l) = P Vn {2) = ±^ (a n -► 
as n — > oo). One of the best choices of possible pairs of encoder-decoder 
(v^n : Vn <^n, VVi : V„), either in the average sense or in the max- 

imum sense, is such that <p n (i) = i for i = l,2;</? n (0) = 1; ip n {i) = i for 
i = 1,2. Then, the average error probability — > 0, while the 
maximum error probability is £™ = 1. Thus, in this case, the source V n 
is transmissible in the average sense over the channel W n , while it is not 
transmissible in the maximum sense. 

Hereafter, the probability e n is understood to denote the "average" error 
probability, unless otherwise stated. □ 
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3 Fundamental Lemmas 



In this section, we prepare two fundamental lemmas that are needed in 
the next section in order to establish the main theorems (Direct part and 
Converse part). 

Lemma 3.1 (Generalization of Feinstein's lemma) Given a general source 
V = {V n }^ =1 and a general channel W = {W n }™ =1 , let X n be any input 
random variable taking values in X n and Y n be the channel output via W n 
due to the channel input X n , where V n — > X n — > Y n . Then, for every 
n = 1,2, ■ ■ -, there exists an (n,e n ) code such that 

{1 W n (Y n \X n } 111 
wheret 7 > is an arbitrary positive number. 

Remark 3.1 In a special case where the source V = {V n }^ =1 is uniformly 
distributed on the massage set M n = {1, 2, • • • , M n }, it follows that 

n Pyn[y n ) n 

which implies that the entropy spectrum^ of the source V = {V n }^ =1 is 
exactly one point spectrum concentrated on MogM n . Therefore, in this 
special case, Lemma 3.1 reduecs to Feinstein's lemma [2]. □ 



Proof of LemmaS.l: 

For each v£ V", generate x(v) G X n at random according to the con- 
ditional distribution P X n\yn(-\v) and let x(v) be the codeword for v. In 
other words, we define the encoder ip n : V n — > X n as <p n (v) = x(v), where 



^In the case where the input and output alphabets X ,y are abstract (not necessarily 
countable), ^^"A^ in is understood to be g(Y"\X n ), where 5 (y|x) = ^ifdy) 

= W p Y l?(dy)Px™(dxj = Pxn"dx"fv C "'(dy) ' s tne Radon-Nikodym derivative that is measurable 
in (x,y). 

tr The probablity distribution of i log Pvn \ V n- ) is called the entropy spectrum of the 

source V = {V n }^Li, whereas the probability distribution of i log W P ^ryn) ^ ls called the 
mutual information spectrum of the channel W = {W n }'^L 1 given the input X = {X"}^^ 
(cf. Han and Verdu [7]). 
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{x(v) I Vv G V n } are all independently generated. We define the decoder 
ipn ■ y n -> V n as follows: Set 

f 1 W n (y\x) 1 1 1 

S n = (v,x,y)giT -log W [y . 1 ' >-log— —-+7 , 
I n ^V"(y) n Pyn(v) J 

(3.2) 

5 n (v) = {(xjle^xfKv.xjje^}, (3.3) 

where for simplicity we have put Z n = V n x X n x y n . Suppose that the 
decoder ip n received a channel output y G 3^ n - If there exists one and only 
one v£V" such that (x(v),y) G S n (v), define the decoder as ip n (y) = v ; 
otherwise, let the output of the decoder i/> n (y) £ V™ be arbitrary. Then, the 
probability e n of error for this pair ((p n ,^ n ) (averaged over all the realiza- 
tioins of the random code) is given by 

e n = 53 *V»(v)£n(v), (3.4) 
vev n 

where e n (v) is the probability of error (averaged over all the realizatioins of 
the random code) when v G V" is the source output. We can evaluate e n (\) 
as 

£n(v) < Pr{(x(v),y")^ n (v)} 

+ Pr| (J {(x(v'),r)£S„(v')} 

< Pr{(x(v),y™)^ n (v)} 

+ E Pr{(x(v'),neS n (v')}, (3.5) 

v':v't^v 

where Y n is the channel output via W n due to the channel input x(v). The 
first term on the right-hand side of (3.5) is written as 

A n (v) = Pr{(x(v),y")^ n (v)} 

53 p x«y«|y™(x,y|v). 

(x,y)£S„(v) 

Hence, 

53 iV»(v)A»(v) = 53 -fWv) 53 -Px«Y«|v"( x >yl v ) 

vGV" veV" (x,y)£S„(v) 

= 53 ^V™x™Y™(v,x,y) 

(v,x,y)^5„ 

= Pr{y n X n Y n £ S n }. (3.6) 
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On the other hand, noting that x(v'),x(v) (v' 7^ v) are independent and 
hence x(v'), Y n are also independent, the second term on the right-hand 
side of (3.5) is evaluated as 

B n (v) ee £ Pr{(x(v'),y™)G5 n (v')} 

v':v't^v 

E E p Y"|y™(y|v) J Px"|y«(x|v / ) 

v':vVv (x,y)eS„(v') 

< E E PY"\V"(y\v)Px"\V"{x\v'). 

v'GV" (x,y)€5„(v') 

Hence, 

E Pyn(v)£ n (v) 
vGV n 

^ E E E Pv'«(v)Pyn| y n(y|v)P X n|yn(x|v / ) 

veV™ v'GV" (x,y)eS„(v') 

= E E Py-(y) J Pxn|yn(x|v / ). (3.7) 

v'GV" (x,y)eS„(v') 

On the other hand, in view of (3.2), (3.3), (x,y) € S n (V) implies 

Pyn(y) < Pyn(v')W n (y|x) e -^. 

Therefore, (3.7) is further transformed to 

E Pv-(v)B n (v) 
veV" 

< e""T E E Pv«(vOPx«|v«(x|v')W n (y|x) 

v'GV" (x,y)6S„(v') 

< e-"T E ^V»(v , )P Y »|v»(x|v / )W n (y|x) 

(v',x,y)e2» 

= e" n7 . (3.8) 
Then, from (3.4), (3.6) and (3.8) it follows that 

£n = E ^V"(v)£n(v) 

veV n 

< E iV»(v)^»(v)+ E iV»(v)B n (v) 
vev n veV" 

< Pr{riT^„} + e^. 
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Thus, there must exist a deterministic (n,e n ) code such that 

e n <Vi{V n X n Y n iS n } + e~ n \ 
thereby proving Lemma 3.1. □ 



Lemma 3.2 (Generalization of Verdu-Han's lemma) Let V = {V 11 }^ 
and W = {W™}^! be a general source and a general channel, respectively, 
and let <p n : V n — > X n be the encoder of an (n,e n ) code for (V n , W n ). Put 
X n = i^ ra (V n ) and let Y n be the channel output via W n due to the channel 
input X n , where V n — > X n — > y ra . Then, for every n = 1,2, • • -, it holds 
that 

- lo S p 1^ ' < - log p n/n , - 7 - e"^, (3.9) 
where 7 > is an arbitrary positive number. 

Remark 3.2 In a special case where the source V = {V n }^ =1 is uniformly 
distributed on the massage set M n = {1, 2, • • • , M n }, it follows that 

n Pyn{y n ) n 

which implies that the entropy spectrum of the source V = {V n }^ =1 is 
exactly one point spectrum concentrated on ^logM n . Therefore, in this 
special case, Lemma 3.2 reduecs to Verdu-Han's lemma [8]. □ 



Proof of Lemma3.2 
Define 

L n = |(v,x,y) G Z 
and, for each v£ V" set 

vM = { y ey n \My) = v} 



00 

1 



1 Fyx , 1 
-log - < - log - - -7 , 3.10 

n Pyn(y) - n Pyn(v) 



that is, V(v) is the decoding set for v. Moreover, for each (v, x) G V n x X n , 
set 

B(v,x) = {ya n |(v,x,y)ey. (3.11) 
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Then, noting the Markov chain property (2.1), we have 

Pr{V n X n Y n G L n } 
= E -Pv«x«y"(v,x,y) 

(v,x,y)ei„ 

E /W«(v,x)W n (£(v,x)|x) 

(v,x)6V"xAf" 

E Pynx-(v,x)^ n (S(v,x)nP c (v)|x) 
+ E Py^(v,x)^ n (^(v,x)nP(v)|x) 

(v,x)eV n xA' n 

< E A-X"(v,x)W^(P c (v)|x) 

(v,x)ev n x^ n 

+ E ^(v,x)r i (B(v,x)nD(v)|x) 

(v,x)eV n xA' n 

= E n + E iV»X»(v,x)W^(B(v,x)nX>(v)|x) 

(v,x)eV n xA' n 

= e n + E iW>(v,x) E ^ n (y|x), (3.12) 

(v,x)eV"xX n yeB(v,x)rYD(v) 

where we have used the relation: 

Sn= E A-X"(V,X)W^(P C (V)|X). 

(v,x)ev n xx n 

Now, it follows from (3.10) and (3.11) that y G £>(v,x) implies 

which is substituted into the right-hand side of (3.12) to yield 
Pr {V n X n Y n e L n } 

< £n + e~^ E ^ivn(x|v) E ^v-(y) 

(v,x)eV"x^" y€B(v,x)nX>(v) 

< e n + e-^ E P*»|v»(x|v)iy»(2?(v)) 

(v,x)eV"xA'" 

= e n + e- n ~> E *V»(Z>(v)) 
vev n 
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thereby proving the claim of the lemma. □ 



4 Theorems on Transmissibility 

In this section we give both of a sufficient condition and a necessary condition 
for the transmissibility with a given general souce V = {V n }™ =1 and a given 
general channel W = {W n }^ =1 . 

First, Lemma 3.1 immediately leads us to the following direct theorem: 

Theorem 4.1 (Direct theorem) Let V = {V n }™ =1 , W = {W n }™ =1 be 
a general source and a general channel, respectively. If there exist some 
channel input X = {X n }^? =1 and some sequence { r y n }'^Li satisfying 

7 n > 0, 7 n — > and nj n — > oo (n — > oo) (4-1) 

for which it holds that 

lim Pr - log V ' ; < - log + 7n = 0, (4.2) 

n^oo [ n Pyn(F n ) n Pyn(\/ n ) J 

then the source V = {F n }^ =1 is transmissible over the channel W = 
{W n }^ =1 , where Y n is the channel output via W n due to the channel input 
X n and V n -> X n -> Y n . 

Proof: 

Since in Lemma 3.1 we can choose the constant 7 > so as to depend 
on n, let us take, instead of 7, an arbitrary {7 ra }^ = i satisfying condition 
(4.1). Then, the second term on the right-hand side of (3.1) vanishes as n 
tends to 00, and hence it follows from (4.2) that the right-hand side of (3.1) 
vanishes as n tends to 00. Therefore, the (n,e n ) code as specified in Lemma 
3.1 satisfies lim e n = 0. □ 

n^oo 

Next, Lemma 3.2 immediately leads us to the following converse theorem: 

Theorem 4.2 (Converse theorem) Suppose that a general source V = 
{P}™ =1 is transmissible over a general channel W = {W n }^ =1 . Let the 
channel input be X = {X n = (p n (V n )}™ =1 where <p n : V n -> X n is the 
channel encoder. Then, for any sequence { r y n }^Li satisfying condition (4.1), 
it holds that 

lim Pr - log V ' ; < - log ~ 7n = 0, (4.3) 

n^oo [ n P Y n{Y n ) n Pv"(y n ) J 
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where Y n is the channel output via W n due to the channel input X n and 
Proof: 

If V is transmissible over W, then, by Definition 2.1 there exists an 
(n, e n ) code such that lim e n = 0. Hence, the claim of the theorem imme- 

n^oo 

diately follows from (3.9) in Lemma 3.2 with 7„ instead of 7. □ 

Remark 4.1 Comparing (4.3) in Theorem 4.2 with (4.2) in Theorem 4.1, 
we observe that the only difference is that the sign of 7„ is changed from + 
to — . Since 7 n vanishes as n tends to 00, this difference is asymptotically 
negligible. □ 



Now, let us think of the implication of conditions (4.2) and (4.3). First, 
let us think of (4.2). Putting 

1 , W n (Y n \X n ) n 1, 1 
A n = -log— — — — , ,B n = -log- — — — 

n Pyn(r n ) n Pv^(V n ) 

for simplicity, (4.2) is written as 

a n = Pr {A n <B n + 7n }^0 (n -> 00), (4.4) 
which can be transformed to 



Pi{A n < B n + ln } 
= Y. Vl { B n = u}Pr{A n <B n + ln \B n = u} 

u 

= ^Pr{ J B n = n}Pr{^„<u + 7n |S„ = n}. 
u 

Set 

T n = {u\Pr{A n <u + -f n \B n = u} < ^/^} , (4.5) 
then by virtue of (4.4) and Markov inequality, we have 

Pi {B n eT n }> (4.6) 

Let us now define the upper cumulative probabilities for A n , B n by 

P n (t) = Pr {A n > t} , Q n (t) = Pr {B n > t} , 
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then it follows that 



u 

> Y, ?r{Bn = u}Fr{A n >t\B n = u} 



U>t— 7n 



> J2 Pr{Bn = u}Pr{A n >u + ln \B n = u}. (4.7) 



ueT n : 

u>t—y n 



On the other hand, by means of (4.5), u G T n implies that 

Pr {A n >U + 7 n \B n = ll}>l - yJOn~. 

Theore, by (4.6), (4.7) it is concluded that 

Pn(t) > (1-V^) E ?r{Bn=u} 



u>t—j„ 



> (1- V^)(Qn{t-Jn)-?r{B n (£T n }) 

> (1 - y/^)(Qn(t ~ In) ~ vW) 

> Qn(* - Tn) - 2^/o^ 



That is, 



Pn(t) > Q n (t ~ In) ~ 2y/c£. 



This means that, for all t, the upper cumulative probability P n (t) of A n 
is larger than or equal to the upper cumulative probability Q n (t — 7„) of 
B n , except for the asymptotically vanishing difference 2y / a^. This in turn 
implies that, as a whole, the mutual information spectrum of the channel is 
shifted to the right in comparison with the entropy spectrum of the source. 
With — 7„ instead of j n , the same implication follows also from (4.3). It is 
such an allocation relation between the mutual information spectrum and 
the entropy spectrum that enables us to make an transmissible joint source- 
channel coding. 

However, it is not easy in general to check whether conditions (4.2), (4.3) 
in these forms are satisfied or not. Therefore, we consider to equivalently 
rewrite conditions (4.2), (4.3) into alternative information-spectrum forms 
hopefully easier to depict an intuitive picture. This can actually be done by 
re-choosing the input and output variables X n ,Y n as below. These forms 
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are useful in order to see the relation of conditions (4.2), (4.3) with the 
so-called separation theorem. 

First, we show another information-spectrum form equivalent to the suf- 
ficient condition (4.2) in Theorem 4.1. 

Theorem 4.3 (Equivalence of sufficient conditions) The following two 
conditions are equivalent: 

1) For some channel input X = {X n }^ =1 and some sequence {7 n }^=i 
satisfying condition (4.1), it holds that 

{1 W n (Y n \X n ) 11 1 
- log V ' ; < - log + 7n = 0, (4.8) 

n Pyn(y n ) n Pvn(V n ) J 

where Y n is the channel output via W n due to the channel input X n and 

2) (Strict domination: Vembu, Verdu and Steinberg [9]) For some 
channel input X = {X n }^ =1 , some sequence {c n }^ =1 and some sequence 
{ln}^Li satisfying condition (4.1), it holds that 

lim (pr ( - log -, . > c n 

n->-oo \ [ n Pyn(V n ) 

where Y n is the channel output via W n due to the channel input X n . 



Remark 4.2 (separation in general) @ The sufficient condition 2) in 
Theorem 4.3 means that the entropy spectrum of the source and the mu- 
tual information spectrum of the channel are asymptotically completely split 
with a vacant boundary of asymptotically vanishing width 7„, and the for- 
mer is placed to the left of the latter, where these two spectra may oscillate 
"synchronously" with n. In the case where such a separation condition 2) 
is satisfied, we can split reliable joint source-channel coding in two steps as 
follows (separation of source coding and channel coding): We first encode 
the source output V n at the fixed-length coding rate c n = \ log M n (M n is 
the size of the message set M n ), and then encode the output of the source 
encoder into the channel. The error probabilty e n for this two step coding is 
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upper bounded by the sum of the error probability of the fixed-length source 
coding (cf. Vembu, Verdu and Steinberg [9]; Han [11, Lemma 1.3.1]): 



Pr (-log \ > c n \ 

and the "maximum" error probability of the channel coding (cf. Feinstein 
[2], Ash [3], Han [11, Lemma 3.4.1]): 

f 1 W n (V n \ X n \ 1 

pr k io8 ^V £c " +7 "} +e ~" 7 "- 

It then follows from (4.9) that both of these two error probabilities vanish as 
n tends to oo, where it should be noted that e~ nin — ► as n — > oo. Thus, we 
have lim e n = to conclude that the source V = {V"}?^ is transmissible 

n— >oo 

over the channel W = {W n }^ =1 . This can be regarded as providing another 
proof of Theorem 4.1. □ 



Proof of Theorem 4-3: 

2) =>■ 1): For any joint probability distribution Py^x n for V n and X n , we 
have 



1 VF n (Y n |X n ) 1 , 
Pr i - log „ v — - ; ; < - lop 



1 



n & P Y n(Y n ) ~ n P v ™(V r 
< Pr <! - log — — l — — > c n 



In 



n * Pvn(V n ) 

1 W n (Y n \X n ) 

+ Pl<- log J <C n + J n }, 



n to Pyn(y«) 

which together with (4.9) implies (4.8). 

1) => 2)F Supposing that condition 1) holds, put 
1 W n (Y n \X n ) 1 



a n = Pr< - lo, 



< - lop 



+ 7n h 



and moreover, with 7^ = <5 n = max(y / a^, e - ™ 7 ™), define 
d n = sup [ R (Pr {i log ^-L^ > R ] > 6 n ) - y B 



(4.10) 



(4.11) 
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Furthermore, define 

S n = |vGV n 



— log 



> d n > , 



n Py« (v) 
A« = Pi{V n eS n }, \W = Pr{V n tS n }, 



(4.12) 
(4.13) 



then the joint probability distribution Pynx n Y n can be written as a mixture: 

Pyn X ny„(v,X,y) 

= A«P^„^(v,x,y) + A( i 2 )p F n irF n(v,x,y), (4.14) 

where Py nXn y n , Pyn^nyn are the conditional probability distributions of 
y n X n Y n conditioned on V n £ S n , V n ^ S n , respectively. We notice here 

at the Markov chain property V n — > ^ 
= and the Markov chain properties 

y n — >. x n — ». y n , 17 r 
We now rewrite (4.10) as 



that the Markov chain property V n — > X™ — > Y n implies -Ppn|xn = -Py n |x 



A — > J 



A^ Pr { - log i — J i < - 



log- 



+ 7n 



lf2 , 1, W n (Y \X ) 1 , 1 
+A^ Pr - log V ' ' < - log + 7n 

In Pyu(y ) n Pyn(F ) 



On the other hand, since (4.11), (4.12) lead to \$ > 5 n > y/a^, it follows 
from (4.15) that 



(4.15) 



1 W n (Y n \ J*C n \ 1 1 i 
Pr<!-log { .1 . ' < -log __ +7n <^. (4.16) 



n Pyn(Y") n P yn (V n 

Then, by the definition of V n , 

1, 1 

— log = — > gL, 

n & P Vn (V n ) ~ 

and so from (4.16), we obtain 

f 1 W n (Y n \X n ) ] 
Pr I - log _ V 4 : ; < dn + 7n \ < v 7 ^- (4.17) 



n Pyn(y^) 
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Next, since it follows from (4.14) that 

Pvn(y) = A«P^(y) + A( l 2 )p F n( y ) 

> A^)Py„(y) 

> 5 n P Yn (y) 

> e~ n ^Py n (y), 

we have 

1, 1 1, 1 . 

— log = — < — log = h 7„ , 

n & P Yn (Y n )-n 6 P yn (Y n ) ' n ' 

which is substituted into (4.17) to get 

- log p ^ ( ^ n) < dn + In ~ l' n j < V^- (4-18) 

On the other hand, by the definition (4.11) of d n , 

MNn^W-^ 27 "} 5 ' 5 "' (419) 

Set c n = d n + 2^' n and note that a n — > 0, 5 n — ► (n — ► oo) and 7^ = -j-' 
then by (4.18), (4.19) we have 

Jim, fPr{^lo g? ^y>c, 



n— >oo 



f 1 W n (Y n \X n ) 1 
+ Pr ^ - log — J < c n + -7, 



n 



Finally, resetting X n Y n , ^7n as X n Y ra and 7„, respectively, we conclude 
that condition 2), i.e., (4.9) holds. □ 

Having established an information-spectrum separation-like form of the 
sufficient condition (4.2) in Theorem 4.1, let us now turn to demonstrate 
several information-spectrum versions derived from the necessary condition 
(4.3) in Theorem 4.2. 

Proposition 4.1 (Necessary conditions) The following two are neces- 
sary conditions for the transmissibility. 
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1 ) For some channel input X = {I™}™ =1 and for any sequence {7 n }^i 
satisfying condition (4.1), it holds that 

{1 W n ( Y n I X n ) 11 1 
»** fv-.(v-) ^'"w'T' 1 (4 ' 20) 

where Y n is the channel output via W n due to the channel input X n and 

yn ^ j£n ^ yn 

2) For any sequence { r y n }^Li satisfying condition (4.1) and for some 
channel input X = {P}J° =1 , it holds that 

- lQ g p rL\ ^ ~ lo § P 7T7^ " 7n = 0, (4.21) 



where Y n is the channel output via W n due to the channel input X n and 



Proof: The necessity of condition 1) immediately follows from necessity con- 
dition (4.3) in Theorem 4.2. Moreover, it is also trivial to see that condition 

1) implies condition 2) as an immediate logical consequence, and hence con- 
dition 2) is also a necessary condition. □ 

The necessary condition 1) in Theorem 4.4 below is the same as condition 

2) in Proposition 4.1. This is written here again in order to emphasize a 
pleasing duality between Theorem 4.3 and Theorem 4.4, which reflects on 
the duality between two fundamental Lemmas 3.1 and 3.2 . 

Theorem 4.4 (Equivalence of necessary conditions) The following two 
conditions are equivalent: 

1) For any sequence { r y n }'^Li satisfying condition (4.1) and for some 
channel input X = {X n }^ =1 , it holds that 

{1 W n (Y n \X n ) 11 1 
~ lQ g p rL\ < ~ !og p , T/W x ~ 7n \ = 0, (4.22) 

where Y n is the channel output via W n due to the channel input X n and 

yn y j^n ^ yn 
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2) (Domination) For any sequence {"f n }%Li satisfying condition (4.1) 
and for some channel input X = {X n }^ =1 and some sequence {c n }^ =1 , it 
holds that 

lim (Pr ( - log -, . > c n X 

{1 W n (Y n \X n \ *) \ 

«^^hhr ic - ->"}) =0 - (03) 

where Y n is the channel output via W n due to the channel input X n . 
Proof: 

This theorem can be proved in the entirely same manner as in the proof 
of Theorem 4.3 with 7„ replaced by — j n . □ 



Remark 4.3 Originally, the definition of domination given by Vembu, Verdii 
and Steinberg [9] is not condition 2) in Theorem 4.4 but the following: 

2') (Domination) For any sequence {d n }^ =1 and any sequence {"f n }%Li 
satisfying condition (4.1), there exists some channel input X = {I n }™ =1 
such that 

lim [Pr (-log — — >d n X 

n^oo \ [ n P V n{V n ) J 

holds, where Y n is the channel output via W n due to the channel input X n . 

□ 



This necessary condition 2') is implied by necessary condition 2) in The- 
orem 4.4. To see this, set 

a " s Pr {^ log 7v^) £<! ''}' (4 ' 25) 

«• s Pr {n'°« iV-(r-) < 426 > 

s Pr {s los fwW a<i "}' (427) 

f 1 ]V n (Y n \X n ) 1 

- Prj-log ^ B( ; B / <^-7n}. (4.28) 
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Then, we observe that n n < a n if d n > c n ; and fi n < (3 n if d n < c n , and 
hence it follows from condition 2) that K n fi n < a n + (3 n — ► as n tends to oo. 
Thus, condition 2) implies condition 2'), which means that condition 2) is 
strictly stronger than or equivalent to condition 2') as necessary conditions 
for the transmissibility. It is not currently clear, however, whether both are 
equivalent or not. □ 



Remark 4.4 Condition 2) in Theorem 4.4 of this form is used later to 
directly prove Theorem 6.6 (separation theorem), while condition 2') in Re- 
mark 4.3 of this form is irrelevant for this purpose. □ 



5 e- Transmissibility Theorem 

So far we have considered only the case where the error probability e n sat- 
isfies the condition lim e n = 0. However, we can relax this condition as 
follows: 

limsupe ra < e, (5-1) 

n— too 

where e is any constant such that < e < 1. (It is obvious that the special 
case with e = coincides with the case that we have considered so far.) We 
now say that the source V is e-transmissible over the channel W when there 
exists an (n, e n ) code satisfying condition (5.1). 

Then, the same arguments as in the previous sections with due slight 
modifications lead to the following two theorems in parallel with Theorem 
4.1 and Theorem 4.2, respectively: 

Theorem 5.1 (e-Direct theorem) Let V = {V n }™ =1 , W = {W n }™ =1 
be a general source and a general channel, respectively. If there exist some 
channel input X = {I n }~ =1 and some sequence {7n}^Li such that 

7„ > 0, 7„ — > and n^ n — > oo (n — > oo) (5-2) 

for which it holds that 

f 1 , W n (Y n \X n ) 11 I , N 

limsupPr - log p V ' < - log + 7 n \ < e, 5.3 

then the source V = {y ri }^ =1 is e-transmissible over the channel W = 
{W 71 }^!, where Y n is the channel output via W n due to the channel input 
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X n and V 



n 



X 



□ 



Theorem 5.2 (e-Converse theorem) Suppose that a general source V = 
{V n }^ =l is e-transmissible over a general channel W = {W n }^ =1 , and let 
the channel input be X = {X n = ip n (V n )}^ =1 where (p n : V n -> X n is the 
channel encoder. Then, for any sequence {7 n }^Li satisfying condition (5.2), 
it holds that 



where Y n is the channel output via W n due to the channel input X n and 



Remark 5.1 It should be noted here that such a sufficient condition (5.3) 
as well as such a necessary condition (5.4) for the e-transmissibility can- 
not actually be derived in the way of generalizing the strict domination in 
(4.9) and the domination in (4.23). It should be noted also that, under the 
e-transmissibility criterion, joint source-channel coding is beyond the sepa- 
ration principle. □ 



6 Separation Theorems of the Traditional Type 

Thus far we have investigated the joint source-channel coding problem from 
the viewpoint of information spectra and established the fundamental the- 
orems (Theorems 4.1~4.4). These results are of seemingly different forms 
from separation theorems of the traditional type. Then, it would be natural 
to ask a question how the separation principle of the information spectrum 
type is related to separation theorems of the traditional type. In this section 
we address this question. 

To do so, we first need some preparation. We denote by Rf(V) the 
infimum of all achievable fixed-length coding rates for a general source V = 
{V n } ( ^ = i (as for the formal definition, see Han and Verdu [7], Han [11, 
Definitions 1.1.1, 1.1.2]), and denote by C(W) the capacity of a general 
channel W = {W n : X n — ► y n }^ = i (as for the formal definition, see Han and 
Verdu [7], Han [11, Definitions 3.1.1, 3.1.2]). First, Rf(V) is characterized 




(5.4) 




□ 



as 
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Theorem 6.1 (Han and Verdu [7], Han[ll]) 



R f (V) = H(V), (6.1) 

where § 

g(V)=p-lim S npllog 1 (6.2) 

n-»oo n rv n \v ) 

Next, let us consider about the characterization of C(W). Given a 
general channel W = {W n }%> =1 and its input X = {X n }%> =1 , let Y = 
{Y n }™ =1 be the output due to the input X = {X n }™ = 1 via the channel 
W = {W n }™ =1 . Define 

Definition 6.1 

1 W n (Y n \X n ) 
l(X;Y) = p-liminf-log { ' > . (6.3) 

Then, the capacity C(W) is characterized as follows. 
Theorem 6.2 (Verdu and Han [8], Han[ll]) 

C(W) = sup/(X;Y), (6.4) 
x 

where sup x means the supremum over all possible inputs X. □ 



With these preparations, let us turn to the separation theorem prob- 
lem of the traditional type. A general source V = {V™}^! is said to be 
information- stable (cf. Dobrushin [4], Pinsker [5]) if 

i lncr I - 

H^yt) ^ 1 m P rob -> ( 6 - 5 ) 

where H n {V n ) = ±H(V n ) and H{V n ) stands for the entropy of V n (cf. 
Cover and Thomas [13]). Moreover, a general channel W = {W n }'^ ) =1 is 



§For an arbitrary sequence of real-valued random variables {Z n } n °- 1 , we define the 
following notions (cf. Han and Verdu [7], Han[ll]): p-limsup„_ 00 Z n = inf{a | lim n _>oo 
Pr {Z n > a} — 0} (the limit superior in probability), and p- lim inf^oo Z n = sup{/3 | 
lim n _,oo Pr {Z n < [3} = 0} (the limit inferior in probability). 
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said to be information- stable (cf. Dobrushin [4], Pinsker [5], Hu [6]) if there 
exists a channel input X = {I"}™ =1 such that 

l w w(y"|x-») 

c B (W") 1 mprob -' (6 - 6) 

where 

c n (w") = su P -j(x" ; y"), 

and y n is the channel output via W n due to the channel input X n ; and 
I(X n ;Y n ) is the mutual information between X n and K n (cf. Cover and 
Thomas [13]). Then, we can summarize a typical separation theorem of the 
traditional type as follows. 

Theorem 6.3 (Dobrushin [4], Pinsker [5]) Let the channel W = {W n }™ =1 
be information-stable and suppose that the limit lim C n (W n ) exists, or, let 

n^oo 

the source V = {V n }^ =1 be information-stable and suppose that the limit 
lim H n (V n ) exists. Then, the following two statements hold: 

n^oo 

1) If Rf(V) < C(W), then the source V is transmissible over the channel 
W. In this case, we can separate the source coding and the channel 
coding. 

2) If the source V is transmissible over the channel W, then it must hold 
that R f (V) < C(W). □ 



In order to generalize Theorem 6.3, we need to introduce the concept of 
optimistic coding. The "optimistic" standpoint means that we evaluate the 
coding reliability with error probability lim inf n _»oo £n = (that is, e„ < Ve 
for infinitely many n). In contrast with this, the standpoint that we have 
taken so far is called pessimistic with error probability linin^oo e n = (that 
is, e n < Ve for all sufficiently large n). 

The following one concerns the optimistic source coding with any general 
source V. 

Definition 6.2 (Optimistic achievability for source coding) 

Rate R is optimistically achievable 4=^ There exists an (n, M n ,e n )- source code 

satisfying lim inf e n = and 

lim sup — log M n < R, 

n— >oo n 
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where ^ log M n is the coding rate per source letter (see, e.g., Han [11, Section 
1.1]). 

Definition 6.3 (Optimistic achievable fixed-length coding rate) 

Rj{V) = inf {R | R is optimistically achievable} . 

Then, for any general source V = {V n }^ =1 we have: 
Theorem 6.4 (Chen and Alajaji [14]) 

R f (V) =inf [r liminfPr(-log - \ r > r) = o) . (6.7) 
— /v ; I n^oo \ n 6 P v „(V n ) - J J K J 

On the other hand, the next one concerns the optimistic channel capacity. 

Definition 6.4 (Optimistic achievability for channel coding) 

Rate R is optimistically achievable There exists an (n, M n , e n )-channel code 

satisfying lim inf e n = and 

n— »oo 

lim inf — log M n > R, 

n^oo fi 

where - logM n is the coding rate per channel use (see, e.g., Han [11, Section 
3.1]). U 

Definition 6.5 (Optimistic channel capacity) 

C(W) = sup{i? | R is optimistically achievable} . 

Then, with a general channel W = {M /ri }^ =1 we have 
Theorem 6.5 (Chen and Alajaji [14]) 

C(W) 

R liminf Pr -log K ' > < R \= , (6.8) 

n^oo [ n P Y n{Y n ) J J 

where Y n is the output due to the input X = {X n }'^' =1 . □ 
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Remark 6.1 It is not difficult to check that, in parallel with Theorem 6.4 
and Theorem 6.5, Theorem 6.1 and Theorem 6.2 can be rewritten as 



R f (V) 
C(W) 



lim Pr 



1 



log- 



1 



sup sup < R 
x I 



n^oo \ n Pyn(V 



> R 







lim Pr 



1 , W n (Y n \X n ) 
— log —— - — -— < it 



(6.9) 



(6.10) 



from which, together with Theorem 6.4 and Theorem 6.5, it immediately 
follows that 



C(W) < C(W), 
R f (V) < Rf(V). 



(6.11) 
(6.12) 



Now, we have: 

Theorem 6.6 Let W = {W n }™ =1 be a general channel and V = {V n }™ =1 
be a general source. Then, the following two statements hold: 

1) If Rf(V) < C(W), then the source V is transmissible over the channel 
W. In this case, we can separate the source coding and the channel 
coding. 

2) If the source V is transmissible over the channel W, then it must hold 
that 

Rf(Y) < C(W), (6.13) 
Rf(V) < C(W). (6.14) 

Remark 6.2 As was mentioned in Remark 4.4, we use Theorem 4.4 in or- 
der to prove (6.13) and (6.14), where inequality (6.14) was shown in a rather 
roundabout manner by Vembu, Verdii and Steinberg [9] (invoking Domina- 
tion 2') in Remark 4.3 instead of Domination 2) in Theorem 4.4). □ 



Proof of Theorem 6.6. 

1): Since R f (V) = H(V), C(W) = sup x I(X;Y) by Theorem 6.1 and 
Theorem 6.2, the inequality Rf(V) < C(W) implies that condition 2) in 
Theorem 4.3 holds for X = {X n }^ =l attaining the supremum sup x I(X; Y) 
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with, for example, c n = ^(Rf(V) + C(W)). Therefore, the source V is 
transmissible over the channel W. 

2): If the source V is transmissible over the channel W, then condition 2) 
in Theorem 4.4 holds with some {c n }^ = i, i.e., 

lim Pr(-log ) >c ra l=0, (6.15) 

n-t-oo [ n Pyn{V n ) J 

{1 W n (Y n \X n ) 1 
~ lQ g p ,vn\ < c re - 7w =0. (6.16) 
n PY«(Y n ) J 

Since limn^oo 7„ = 0, these two conditions with any small constant 5 > 
lead us to the following formulas: 

lim inf Pr ( - log - > lim inf c n + s) = 0, (6.17) 

n-t-oo [ n Pyn(y n ) n->oo J 

lim Pr(-log * > limsupc TO + s\ = 0, (6.18) 

n-too ^ n Pyn(F n ) „^oo J 

{1 W n (Y n \X n ) 1 
-log p l ' > <hminfc re -4 = 0, (6.19) 
71 PyniY 71 ) n-too J 

{1 W n (Y n \X n ) 1 
~ lQ g p /vL <limsupc ra -4 = 0. (6.20) 
n Pyn(y n ) n-too J 

Then, Theorem 6.4 and (6.17) imply that Rf(V) < liminfn^oo c n , whereas 
(6.19) implies that J(X; Y) > liminfn^oo c n . Therefore, by Theorem 6.2 we 
have 

R f (V) < lim inf c„ < J(X;Y) < sup/(X;Y) = C(W). 

On the other hand, (6.18) implies that H(V) < lim sup^oo c n . Further- 
more, (6.20) together with Theorem 6.5 gives us 

H(V) < lim sup c n < C(W). 

n— too 

Finally, note that R f (V) = H(V) by Theorem 6.1. □ 

We are now interested in the problem of what conditions are needed to 
attain equalities R f (V) = R/{V) and/or C(W) = C(W) in Theorem 6.6 
and so on. To see this, we need the following four definitions: 
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Definition 6.6 A general source V = {V n }^ =1 is said to satisfy the strong 
converse property if 

H(V) = H(V) 

holds (as for the operational meaning, refer to Han [11]), where 
H(V) = p- lim inf — log ■ 



n-oo n 6 P V n(V n )' 

Definition 6.7 A general channel W = {H^" - }^! is said to satisfy the 
strong converse property if 

sup J(X; Y) = sup/(X; Y) (6.21) 
x x 

holds (as for the operational meaning, refer to Han [11], Verdu and Han [8]), 
where 

1 W n (Y n \X n ) 
/(X; Y) = p- lim sup - log 1 ' } . 

n^oo n Pyn(y n ) 

Definition 6.8 A general source V = {V n } c £ =1 is said to satisfy the semi- 
strong converse property if for all divergent subsequences {ni}^ =1 of positive 
integers such that n\ < n2 < ■ ■ ■ — ► oo it holds that 

P- lim sup -log ) =H(V). (6.22) 

Definition 6.9 A general channel W = {W n }'^ ) =1 is said to satisfy the 
semi-strong converse property if for all divergent subsequences {ni}^ =l of 
positive integers such that n\ < ri2 < • • • — > oo it holds that 

1 W ni (Y ni \X ni ) 
p- lim inf -log 1 ' ; < sup/(X;Y), (6.23) 

?-»oo Ui PY n i{i % ) X 

where Y n is the channel output via W n due to the channel input X n . □ 



With these definitions, we have the following lemmas: 
Lemma 6.1 

1) The information-stability of a source V (resp. a channel W) with the 
limit implies the strong converse property of V (resp. W). 
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2) The strong converse property of a source V (resp. a channel W) im- 
plies the semi-strong converse property of V (resp. W). □ 



Lemma 6.2 

1) A general source V satisfies the semi-strong converse property if and 
only if 

R f (V)=R f (V). (6.24) 

2) A general channel W satisfies the semi-strong converse property if and 
only if 

C(W) = C(W). (6.25) 

Proof: It is obvious in view of Theorem 6.4, Theorem 6.5 and Remark 
6.1. □ 



Remark 6.3 An operational equivalent of the notion of semi-strong con- 
verse property is found in Vembu, Verdu and Steinberg [9]. Originally, 
Csiszar and Korner [12] posed two operational standpoints in source coding 
and channel coding, i.e., the pessimistic standpoint and the optimistic stand- 
point. In their terminology, Lemma 6.2 states that, for source coding, the 
semi-strong convserse property is equivalent to the statement that both the 
pessimistic standpoint and the optiimistic standpoint result in the same infi- 
mum of all achievable fixed-length source coding rates; similarly, for channel 
coding, the semi-strong convserse property is equivalent to the claim that 
both the pessimistic standpoint and the optimistic standpoint result in the 
same supremum of all achievable channel coding rates. □ 



Thus, Theorem 6.6 together with Lemma 6.2 immediately yields the 
following stronger separation theorem of the traditional type: 

Theorem 6.7 Let either a general source V = {V n }^ =1 or a general chan- 
nel W = {M /n }^ =1 satisfy the semi-strong converse property. Then, the 
following two statements hold: 

1) If Rf(V) < C(W), then the source V is transmissible over the channel 
W. In this case, we can separate the source coding and the channel 
coding. 
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2) If the source V is transmissible over the channel W, then it must hold 
that R f (V) < C(W). □ 



Example 6.1 Theorem 6.3 is an immediate consequence of Theorem 6.7 
together with Lemma 6.1. □ 

Example 6.2 Let us consider two different stationary memoryless sources 
Vi = {V"}^!, V2 = {V 2 n }^ =1 with countably infinite source alphabet V, 
and define its mixed source V = {V n }^ =1 by 

P vn ( v ) = ai P V n (v) + a 2 Pv 2 " (v) (v G V n ), 

where a±, a 2 are positive constants such that a\ + a 2 = 1. Then, this mixed 
source V = {V n }'^ ) =1 satisfies the semi-strong converse property but neither 
the strong converse property nor the information-stability. 

Similarly, let us consider two different stationary memoryless channels 
Wi = {Wi}^ =1 , W 2 = {WT}^! with arbitrary abstract input and output 
alphabets X,y, and define its mixed channel W = {H /n }^ 1 by 

W n (y|x) = aiW7(y|x) + a 2 W^(y|x) (x € X n , y £ y n ). 

Then, this mixed channel W = {W /n }^ 1 satisfies the semi-strong con- 
verse property but neither the strong converse property nor the information- 
stability. 

Thus, in these mixed cases the separation theorem holds. □ 



References 

[1] C. E. Shannon, "A mathematical theory of communication," Bell Sys- 
tem Technical Journal, vol.27, pp.379-423, pp. 623-656, 1948 

[2] A. Feinstein, "A new basic theorem of information theory," IRE Trans. 
PGIT, vol.4, pp.2-22, 1954 

[3] R.B. Ash, Information Theory, Interscience Publishers, New York, 
1965 



29 



[4] R. L. Dobrushin, "A general formulation of the fundamental Shan- 
non theorem in information theory," Uspehi Mat. Acad. Nauk. SSSR, 
vol.40, pp. 3-104, 1959: Translation in Transactions of American Math- 
ematical Society, Series 2, vol.33, pp. 323-438, 1963 

[5] M. S. Pinsker, Information and Information Stability of Random Vari- 
ables and Processes, Holden-Day, San Francisco, 1964 

[6] G. D. Hu, "On Shannon theorem and its converse for sequence of 
communication schemes in the case of abstract random variables," in 
Trans. 3rd Prague Conference on Information Theory, Statistical Deci- 
sion Functions, Random Processes, Czechslovak Academy of Sciences, 
Prague, pp. 285-333, 1964 

[7] T.S. Han and S. Verdii, "Approximation theory of output statistics," 
IEEE Transactions on Information Theory, vol.IT-39, no. 3, pp. 752- 
772, 1993 

[8] S. Verdii and T.S. Han, "A general formula for channel capacity," 
IEEE Transactions on Information Theory, vol.IT-40, no. 4, pp. 1147- 
1157, 1994 

[9] S. Vembu, S. Verdii and Y. Steinberg, "The source-channel separa- 
tion theorem revisited," IEEE Transactions on Information Theory, 
vol.IT-41, no.l, pp. 44-54, 1995 

[10] S. Verdii and T. S. Han, "The role of the asymptotic equipartition 
property in noiseless source coding," IEEE Transactions on Informa- 
tion Theory, vol.IT-43, no.3, pp.847-857, 1997 

[11] T. S. Han, Information- Spectrum Methods in Information Theory, 
Springer Verlag, New York, 2003 

[12] I. Csiszar and J. Korner, Information Theory: Coding Theorems for 
Discrete Memoryless Systems, Academic Press, New York, 1981 

[13] T. M. Cover and J. Thomas, Elements of Information Theory, Wiley, 
New York, 1991 

[14] P.N. Chen and F. Alajaji, "Optimistic Shannon coding theorems for 
arbitrary single-user systems," IEEE Transactions on Information 
Theory, IT-45, pp. 2623-2629, 1999 



30 



